Stochastic Spectral Descent for Restricted Boltzmann Machines
نویسندگان
چکیده
Restricted Boltzmann Machines (RBMs) are widely used as building blocks for deep learning models. Learning typically proceeds by using stochastic gradient descent, and the gradients are estimated with sampling methods. However, the gradient estimation is a computational bottleneck, so better use of the gradients will speed up the descent algorithm. To this end, we first derive upper bounds on the RBM cost function, then show that descent methods can have natural advantages by operating in the `∞ and Shatten∞ norm. We introduce a new method called “Stochastic Spectral Descent” that updates parameters in the normed space. Empirical results show dramatic improvements over stochastic gradient descent, and have only have a fractional increase on the per-iteration cost.
منابع مشابه
Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs
Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up th...
متن کاملExperiments with Stochastic Gradient Descent: Condensations of the Real line
It is well-known that training Restricted Boltzmann Machines (RBMs) can be difficult in practice. In the realm of stochastic gradient methods, several tricks have been used to obtain faster convergence. These include gradient averaging (known as momentum), averaging the parameters w, and different schedules for decreasing the “learning rate” parameter. In this article, we explore the use of con...
متن کاملSequential Labeling with online Deep Learning
In this paper, we leverage both deep learning and conditional random fields (CRFs) for sequential labeling. More specifically, we propose a mixture objective function to predict labels either independent or correlated in the sequential patterns. We learn model parameters in a simple but effective way. In particular, we pretrain the deep structure with greedy layer-wise restricted Boltzmann mach...
متن کاملRestricted Boltzmann Machines with Gaussian Visible Units Guided by Pairwise Constraints
Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGR...
متن کاملAdaptive dropout for training deep neural networks
Recently, it was shown that deep neural networks can perform very well if the activities of hidden units are regularized during learning, e.g, by randomly dropping out 50% of their activities. We describe a method called ‘standout’ in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero. This ‘adapt...
متن کامل